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A METHOD TOR ISOLATING A FAULT FROM ERROR MESSAGES 

Technical field 

The present invention pertains to a method, a use of the method, a system, and a 
use of the system, a computer program code element, and a computer Teadable medium for 
isolating a fault from a plurality of error messages in a system log of a process controlled by 
an object oriented program. 

Prior art 

Developing control systems for complex systems is a difficult and increasingly 
important task. Larger control systems have traditionally been developed using structured 
analysis and functional decomposition, see for example T. DeMarco. Structured Analysis and 
System Specification. Prentice-Hall, 1979. When applying traditional programming, it is 
possible to optimize a programming code with respect to real-time requirements, and it is 
easier to generate concise error messages since the state of a whole control program is known 
at each point of time. 

Today many large systems are designed using an object oriented approach. This 
has several advantages over traditional approaches, including better possibility to cope with 
complexity and to achieve good maintainability and reusability. However, since error 
messages originating from a certain fault often reflect the control system design and 
architecture, it can be very difficult for an operator to understand which error message, of a 
plurality of error messages, is most relevant and closest to the real fault. 

One method to use for fault isolation is to compile a database, or use an expert 
system, see e.g. W. T. Scherer and C. C. White. A survey of expert systems for equipment 
maintenance and diagnostics. In S. G. Tzafestas, editor. Knowledge-based system diagnosis, 
supervision and control, pages 285-300. Plenum, New York, 1989, and S. Tzafestas and K. 
Watanabe. Modem approaches to system/sensor fault detection and diagnosis. Journal A, 
3 l(4):42-57, December 1990. But a highly configurable system has the disadvantage that 
every installation of the control system needs a new database. Also, when changes are made 
in the system itself, it can render an extensive database useless. 

Another possibility for fault isolation is to use a model that is tailored for fault 
isolation. Some general examples can be found in W. Hamscher, L. Console, and J. de Kleer, 
editors. Readings in model-based diagnosis. Morgan Kaufmann Publishers, San Mateo, CA, 
1992, and M. Sampath, R. Sengupta, S. Lafoitune, K. Sinnamohideen, and D. Teneketzis. 
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Diagnosability of discrete-event systems. IEEE Transactions on Automatic Control, 
40(9):1555-1575, September 1995. 
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The advantage of using a specialized model is that such a model can contain 
exactly the information needed for the fault isolation, and the fault isolation procedure can be 
much simplified. The disadvantage is that this model must be maintained separately. To 
maintain such a model is not trivial and a great deal of work may be needed to keep the model 
5 up to date. 

Exception handling mechanisms are intended to help improve error handling in 
software, to make programs more reliable and robust They are language constructs that 
facilitate error handling outside of the conventional program flow and at the appropriate level. 
The exception constructs might also support a programmer in providing more information to 
the error handler code than available through the conventional object interface, to facilitate 
error recovery. 

Exception handling mechanisms are in their nature low level constructs, and as 
such address a fault handling problem from the bottom up. It is known from to note, as 
pointed out in R. Miller and A. Tripathi "Issues with exception handling in object-oriented 
systems", in Proceedings of 1 1th European Conference on Object-Oriented Programming 
(ECOOP97), Jyvaskyla, Finland, June 1997, that the goals of exception handling often stand 
in direct conflict with the goals of an object oriented architecture, the very same goals of 
encapsulation and modularity that cause the fault propagation problem addressed in the 
present invention. 

Summary of the described invention 
The present invention provides a model based method for fault isolation by 
introducing an extra layer between an operator and/or process control means and actual error 
messages produced by a control system. This approach can be summarized as having a liberal 
error message sending policy for individual objects, with a structured signature for the 
messages, and a clustering method to isolate fault scenarios. A model of the system is then 
used to isolate the object/objects closest to the fault in each fault scenario. The messages) 
closest to the fault and hence most relevant, can then be presented to, for example, an 
operator, that can take measures to deal with a fault more easily man previously known in 
accordance with the present invention. 

In order to overcome disadvantages and problems related to fault isolation today, for 
example, for processes involving an industrial robot, the present invention sets forth a method 
provided for isolating a fault from a plurality of error messages in a system tog of a process 
controlled by an object oriented program. 
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In an embodiment of the present invention it comprises the steps of, 

- creating a predefined signature for each error message of said plurality of error messages, 

- fonning a time period to frame from said plurality of error messages the error messages 
belonging to a fault scenario of said fault, 

5 - clustering said error messages that fall within said time period, 

- forming from said signatures of said clustered error messages a base graph, which is a 
formal representation of said fault scenario, carrying cause-effect relation information of 
said -error messages, 

- analyzing said base graph rules to isolate said fault, and 

10 - reporting information containing said fault to a control means, which provides measures 
taking care of the fault 

Each of said signature is brought toeontain information on complaint and 
complainer in an embodiment of the invention. 

An explanation model is formed from an underlying assumption of communication 
15 between system entities needed to cany out a specific task, which system entities comprises 
objects, packages and threads, and having said signatures defined on the base of said 
explanation model in an embodiment of the invention. 

The mentioned time period starts with the first error message in a fault scenario and 
ends with a stop message in an embodiment of the invention. 
20 Reported information in accordance with an embodiment of the present invention is 

containing a reliability estimate of said fault. 

In one embodiment of the present invention, in case of an inconclusive base graph, 
said analyzing step further comprises the step of; 

- extending said base graph with the help of a system model, containing information on the 
25 dependency relation between system entities, and 

- analyzing said extended base graph with a set of predetermined explanation rules to 
isolate said fault 

Another embodiment of the present invention comprises that the system model is 
formed using the "Unified Modeling Language <UML) f \ 
30 Still another embodiment of the present invention includes the use of a method 

according to the above in a process involving an industrial robot. 

Further, the present invention provides a system for isolating a fault from a plurality 
of error messages in a system tog of a process controlled by an object oriented program. 
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Each error message of said plurality of error messages comprises in a preferred 
embodiment of the present invention a predesigned signature and that an interface is located 
between the program and a means for process control, said interface comprises means for 
clustering error messages belonging to a fault scenario of said fault, means for forming from 
5 said signatures of said clustered error messages a formal representation, a base graph, of said 
fault scenario, means for analyzing said formal representation to isolate said fault, and means 
for reporting information containing said fault to said process control means, which provides 
measures taking care of the fault 

Another embodiment of the present invention includes that the signature comprises 
10 information on complaint, complainer and complainee, and that said signature is formed from 
an explanation model comprising an underlying assumption of communication between 
system entities needed to cany out a specific task, said system entities comprise objects, 
packages and threads. 

The time period starts with the first error message in said fault scenario and ends 
15 with a stop message in one embodiment of the invention. 

A still further embodiment provides that the system comprises means for establishing 
a reliability estimate of said fault and that said fault information contains said estimate. 

In case of said base graph being inconclusive the system further comprises, in one 
embodiment of the invention, means for extending said base graph with the help of a system 
20 model, containing information on the dependency relation between system entities, and means 
for isolating said fault from said extended graph by using predetermined explanation rules. 

The "Unified Modeling Language (UML)" is used to form said system model in a 
preferred embodiment of the invention. 

Yet another embodiment provides the use of the system in accordance with the above 
25 described system in a process involving an industrial robot 

A further embodiment of the present invention is set forth by a computer program 
code element comprising computer code means for enabling a processor to isolate a fauk from 
a plurality of error messages in a system log of a process controlled by an object oriented 
program. 

30 It enables a processor *o carry out the steps of; 

- forming an interface between said program producing the plurality of error messages and a 
process controlling means, 

■ creating a cluster from said plurality of error messages belonging to a fauK scenario of 
said fault, 
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- analyzing said cluster using predefined error message signatures and a predetermined 
system model to isolate said fault, and 

- reporting information containing said isolated fault to said process controlling means, 
which provides measures taking care of the fault 

5 One embodiment provides the code element contained in a computer readable 

medium. 

Another embodiment of the invention provides that the code element is supplied at 
least in part over a network such as the Internet 

Also set forth through the present invention is a computer readable medium. 
10 Whereby it contains a computer program code element comprising computer code means for 
enabling a processor to isolate a fault from a plurality of error messages in a system log of a 
process controlled by an object oriented program, by which said processor is enabled to cany 
out the steps of: 

- forming an interface between said program producing the plurality of error messages and 
15 a process controlling means, 

- creating a cluster from said plurality of error messages belonging to a fault scenario of 
said fault, 

- analyzing said cluster using predefined error message signatures and a predetermined 
system model to isolate said fault, and 

- reporting information containing said isolated fault to said process controlling means, 
which provides measures taking care of the fault 

Brief description of the drawings 
For a more complete understanding of the present invention and for further 
objectives and advantages thereof, reference may now be made to the following description 
taken in conjunction with the accompanying drawings, in which: 

Fig. 1 schematically illustrates a conventional prior art message system with a 

system log; 

Fig. 2 schematically illustrates a message system for fault isolation in 
accordance with the present invention; 

Fig. 3 schematically illustrates graphical notations for different error messages 
in accordance with an embodiment of the invention; 

Fig. 4 schematically illustrates a flow chart over an analysis procedure in 
accordance with the present invention; 
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Fig. 5 schematically illustrates four possible properties of a base graph node in 
accordance with the present invention; and 

Fig. 6 schematically illustrates a generic example of a UML traversal algorithm. 

Table 

5 Table 1 on the last page of the present description illustrates server chains 

returned by an UML traversal algorithm for the model in accordance with Fig. 6. 

Detailed description of preferred embodiments 
The features of the present invention provide a more abstract view regarding 
fault isolation then prior art, and addresses the problem, mainly fault propagation, fiom a top- 
10 down perspective. It is not intended to replace low level error handling, but to be used in 
conjunction with low level error handling in some form. It can, e.g., be a disciplined use of 
return codes or full-fledged exception handling mechanisms. 

In object oriented design, encapsulation and modularity are fundamental and 
important design goals for reuse, maintenance and complexity reasons. An often conflicting 
15 design goal lies in the need to suppress unnecessary, propagating, error messages and 

eventually give an operator and/or process control means a concise picture of a fault situation, 
in order to act upon it 

The issue is how a configurable and safety critical control system with an 
object-oriented architecture handles and isolates run-time faults and alarms, and specifically 
20 the problems that arise due to the object oriented structure and complexity of the control 
system itself. 

The control system can be multi-threaded, with several concurrent tasks 
communicating both asynchronously and synchronously. The objects in the system according 
to the present invention are both pure software objects as well as objects corresponding to 

25 hardware. The invention is concentrated mainly on fault handling for a folly operational 
system, and will not discuss installation and startup problems. 

With the term fault, a run-time change or event is meant here, normally in 
hardware, that eventually causes the system to abort conventional operation. When a fault 
occurs during conventional operation, the system often generates a large number of error 

30 messages, for example, bursts of errors. Error messages are sent by individual objects to 
notify an operator or a means for operating when the object has detected an «rror condition. 
The individual object does not in general know how dose it is to the real fault, or if sufficient 
reporting is already taking place, and hence whether it should report to the operator or not. 
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Fig. 1 schematically illustrates a conventional prior art message system with a 
system log where box or block 100 depicts input signals from sensors, motors I/O equipment 
etc. Reference numeral 1 10 represent an object oriented control system controlling an 
arrangement such as an industrial robot and checking for messages in a system log 120 
5 depending on its input signals. If there are messages corresponding to the processing of 
signals by the control system 1 10, messages regarding faults or any other message that has to 
be presented to an operator of the system/arrangement are output as information for 
maintenance etc. Prior art control systems and system logs are not designed to pinpoint 
primary faults or to sort them out from among a plurality of messages in order to determine 
10 maintenance, repair etc. for an operator of, for example, a complex control system, this is 
restricted to highly skilled engineers or the like. 

When an error condition is encountered, the affected object normally lets the 
proper interface method return with an error code to the calling object It might also decide to 
continue its conventional operation, e.g., in the case of an event driven threads main loop. The 
15 returned error code in turn can be regarded as an error-condition by the calling object If 

deemed appropriate by designers, the object registers an error message to a central log 120. If 
an error condition is deemed so serious by a detecting object that conventional operation 
cannot continue, a special asynchronous call is made that performs an emergency stop. 

For objects in a program close to each other it is possible to suppress error 
20 messages by information passing, but this is not always feasible - it is an explicit aim of 
object oriented modeling to encapsulate knowledge about the internal state of objects and to 
achieve independence between groups of collaborating objects <i.e., encapsulation and 
modularity). Moreover, the control system that is considered here is safety critical. In case of 
a serious fault, the first priority is to bring the system or arrangement to a safe state. Only then 
25 is it possible to start analyzing what may have caused the fault 

A primary concern is a situation where an operational system is normally 
running without direct supervision. Since the error messages stemming from a certain fault 
often reflects the control system design and architecture, itcan be very difficult for the 
operator to understand which error message that is most relevant and closest 120 to the real 
30 fault 

The present invention, as depicted in Fig. 2, schematically illustrates a message 
system for fault isolation, and sets forth a model based method for fault isolation by 
introducing an extra layer 210, 220, 230 between the operator or means for operating and the 
actual error messages 200 produced by the control system 1 10. The approach can be 
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summarized as having a liberal error message sending policy for individual objects, with a 
structured signature for the messages, and a clustering 210 method to isolate <fault) scenarios. 
A model 220 of the system is then used to isolate the object/objects closest to the fault in -each 
fault scenario (cluster 1 and cluster 2 in Fig. 2). The message(s) closest to the fault {Primary 
fault message 1 and Primary fault message 2 in Fig. 2) and hence most relevant, can then be 
presented to an operator or means for operating. 
A basic terminology is now established: 

Definition: Control system tng 

The system log 200 constitutes a list of events recorded by the system in 
chronological order in the form of messages. 

Eventually the terms event and message are used interchangeably. 
Definition: Fault 

A fault is a run-time change or event, often in hardware 100 that eventually 
causes the system to deviate from normal operation. 

Definition; Error message 

An error message is a message sent by an individual software object to the 
operator, as response to a detection of an error condition for that object 

The kind of error condition that can arise in an object is closely related to the 
refinements of the error message, which is further discussed below 

Definition: Fault scenario 

A fault scenario consists of all instantiated system entities and events involved 
in the origin and manifestation of a specific fault. 

Examples of system entities are objects, links, threads, hardware equipment and 
physical connections. 

The system log may contain messages belonging to several different fault 
scenarios. In order to perform analysis of a single fault scenario, messages that have occurred 
during the same fault scenario are clustered 200, 2 10 together according to the present 
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inventioiL A time period is formed to frame from a plurality of error messages the error 
messages belonging to a fault scenario of the fault 

A clustering 210 method for clustering the error messages that fall within the 
time period according to the present invention is based on maximal time span of a cluster here 
called cluster period framed divider events. A divider event indicates that a new cluster begins 
with this event even if the time period is valid. Each cluster is identified and passed on for 
analysis 230. An example of a basic clustering algorithm is found below. 

The Clustering algorithm pseudn cn<te : 
Name: Clustering 
Input: A system log 
Output: A stream of clusters 

WHILE new event exist OR time out/end of file DO 
IF event exist THEN 
IF event in divider group THEN 
Analyze current cluster. 
Create new cluster and add current event 
ELSE 

IF cluster exists THEN 
IF current event (time) - start event (time) 
< Cluster period 
THEN 

Add current event to cluster 
ELSE 

Analyze current cluster. 
Create new cluster and add current event. 
ELSE 

Create new cluster and add current event 
ELSE 

Analyze current duster. 

Design parameters for clustering are events in a divider group and the length of 
a cluster period. An adaptive scheme could be considered with, e<g., increasingly laiger<or 
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smaller) cluster periods or concatenation of clusters. 

To use a specific explanation model 220 on a fault scenario, it is necessary to 
know how the error messages in the log relate to the chosen explanation base(s). The general 
semantics of this relationship, induced by the demands of an OO architecture (Object 
Oriented, OO), is that an error message is a complaint from an (instantiated) entity about an 
(instantiated) entity. The complainee entity might be the same as the complaining entity 
(internal error), or the complainee might be unknown for OO architecture reasons. The two 
entities need in general not be of the same kind, e.g., an object might complain on a thread. 
An error message signature is complainer and complainee information for an error message. 
An error message signature is present in error messages, comprising information of a sending 
source object and thread, and if possible, the reason for the error message in form of 
information on which system entity that failed to perform a requested service. 

An error message signature for objects as explanation base is detailed below. 
An ordinary method call to another object that returns with an error code is the 
most common example of an error condition that should result in a relational error message 
with known complainee. The complainee may be unknown when the responsible object is not 
known to the complainer for encapsulation or modularity reasons. 

A corresponding error condition could be a detected error in hardware or 
external equipment which the object encapsulates knowledge of. 

A graphical notation for the different error message types is introduced and 
depicted in Fig. 3. Boxes or blocks 30, 32, 34, 36 represent objects. A relational error message 
with known complainee is denoted with an arrow 38 between the complainer 30 and the 
complainee 32. A specific error message uses an arrow to self annotated block 34 with the 
letter "s" A relational error message with unknown complainee also uses an arrow to self 
annotated block 36 with the letter V. It should be noted that the two types of arrows to self 
are semantically different. The relational error message with unknown complainee could for 
example be more suggestively drawn as an arrow ending in a question mark. The notation is 
chosen,for simplicity reasons. 

The information obtainable from error messages adhering to OO design rules are 
not always enough to form a complete and coherent overview picture of a fault scenario. To 
complement the error messages, a system model 220 is used, that contains information on 
how system entities, such as software entities, depend on each other to perform its tasks. The 
model 220 reflects the current system design. 
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An explanation model is the underlying assumptions needed when using a 
specific type of system entity as the base for fault isolation. System entity types that can be 
used as base for explanation models are, e.g., objects, packages and threads (tasks). A main 
idea is to use the structure of the system, manifested in entities as mentioned, to establish 
cause-consequence relationships between error messages. Examples of model structures are 
UML class diagrams, object diagrams and task diagrams, Object Management Group. UML 
Notation Guide, version 1.1. doc no ad/97-08-05, September 1997. 

A system entity often has a certain type, from which several individuals can be 
instantiated, as for class - object and thread type - thread instance. Error messages ofcouree 
come from instantiated entities. The system model though, does not need to contain 
information on instances, but only on types, as for example UML class diagrams. Using the 
model to "guess" probable dependencies between instances would make such an analysis less 
reliable. This fact can be used to estimate the reliability of the analysis. A main goal is a 
cause-consequence relationship between all error messages in a fault scenario, preferably with 
one unique root cause error message. This is a first measure of reliability. 

A design parameter for the analysis is the search depth to use in the model, i.e M 
how many "silent" entities are to be assumed part of the fault scenario. Another design 
parameter is priority based addition of cause-consequence relationships using the system 
model. The priority can be based on properties of the instantiated entities or properties 
assigned to entities in the model. 

For the sake of simplicity in the following presentation, an algorithm is detailed 
below for the specific explanation model that uses only objects as base. The explanation 
model is, in short, that the problems an object experiences, reported for example by error 
messages, can be explained by the objects dependence on other objects and the knowledge 
and resources they administer. This is a suitable assumption when the interest is mainly in 
hardware induced faults. A more complete explanation model would, e.g., use both objects 
and threads as explanation base, and hence also take dependencies between concurrent threads 
in account. 

An analysis can be broken down into three main steps as schematically 
illustrated in Fig. 4 through a flow chart over an analysis procedure in accordance with the 
present invention. 

The forming of a rudimentary partial order directly from the clustered 400 error 
messages can be accomplished given the message signature. It can be represented as a 
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directed graph, which henceforth will be called the error message graph, base message graph 
410 or base graph. 

An error message graph can possibly be modified by a graph extension 420. All 
added edges will be relational, since a goal with graph extensions 420 is to make the graph 
connected. The edges between objects, both from the log and derived, should be read as 
"complaint about". A formed base graph consists of system entities describing how they 
depend on each other in a current fault scenario. The base graph is constructed using 
information in an error message signature. 

The purpose of graph extensions 420 is to make the base graph connected and 
acyclic, with a unique root node. A base graph is used to isolate a probable primary fault, 
using the semantics provided by a suitable explanation model 220. 

Objects in the graph are classified in terms of their error messages, or rather the 
edges they are connected to. Since there are three kinds of error messages <or edges), two 
involving one object and one involving two objects, there are four independent ways an object 
can be connected to an edge. The objects in a base graph will be classified according to this, 
as described below. It also provides a chosen notation for the object properties. 

Noderobiecfi classification and nntatinn 



U Is complainer, unknown complainee 

Cr Is complainer, known complainee 

Ce Is complainee 

S Has specific error message 



Each object in a base graph can of course have any combination of these 
properties and, in analogy with the bitwise or-operator in C and Oh-, the combination of 
properties will be denoted, for example, as S. This expression denotes a complainee with a 
specific error message, which would make it a good candidate for being close to the real fault 
See Figure 5, which schematically illustrates four possible properties of a base graph node in 
accordance with the present invention, for an object 30 with classification U. If the number of 
edges of a specific type which an object is connected to is not taken into account, the total 
number of possible object classifications is 15. 

The strategy -employed becomes straight forward, and a special case chose the 
nodes in the base graph that do not have a complainee or a specific error message, i.c.,no one 
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to "blame m , use the model to find possible complainees already in the base graph, and extend 
the graph accordingly. 

With such a strategy in mind, the 15 object classifications are divided into three 
main groups. (The classifications from the two first groups are named well formed, in contrast 
5 to the classifications from the last group.) 

Three node fobiecrt classification gamps, 



Needs complainee: U, U| Ce and Ce 

OK as it is: S, S| Ce, Cr, U| Cr, U| Ce| Cr andCe| Cr 

10 Undesired cases: U| S, S| Cr, U| S| Ce, U| S| Cr, S| Ce| Cr 

(Probably bad design) or UlS|Ce|Cr 



A pseudo code for a possible "ExtendGraph" algorithm and for the whole graph 
becomes as below. The graph extension for an individual object has here been located in a 
15 below subroutine called extendGraph. 
The ExtgnyKrmph aleorithm 

Name: ExtendGraph 
Input: An error graph 
A UML model 
Output: An extended (or unchanged) error graph 
ExtendGraph(REF Graph) { 
for (all objects in Graph) 
if (object type is U, U|Ce or Ce) 
call extendGraph(object) 

if (object type is S, S|Ce, Cr, U|Cr, UjCe|Cr or Ce|Cr) 
do nothing 

if (object type is U|S, SjCr, UfSJCe, U|S|Cr, SjCejCr or U|S|Ce|Cr) 
issue warning 
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The graph extension for an individual object has here been located in a below 
soubroutine named extendGraph. 

The extendGraph pseudocode 

5 

Name: ExtendGraph 
Input: A base graph 
A UML model 
Output: An extended (or unchanged) base graph 
10 ExtendGraph(REF Graph) { 
for (all objects in Graph) 
if<object type is U f U|Ce or Ce) 
call extendGraph(object) 

if (object type is S, S|Ce,Cr, U|Cr, UjCejCr or Ce|Cr) 
IS do nothing 

if (object type is UfS, S|Cr, U|S|Ce, UjSjCr, S|Ce|Cr orU(S{Ce|Cr) 
issue warning 

) 

20 When it is decided to -extend the graph for a specific object, the UML model, see 

430 in Fig. 4, is used to enumerate all possible servers of that object The object for whkh 
complainees are searched will be named the original search object. The class corresponding to 
the original search object will be named the original search class. 

If there is an object already in the base graph that corresponds to the found 

25 server class, this information is used to "guess" 1 that this is a valid connection in the current 
fault scenario, and extend the graph. Since the search relies on association information, in 
contrast to instantiated links, and the found servers are of course only classes, it is not known 
for certain if an object corresponding to the class is active in the current fauk scenario. 

The search in the UML model 430 provides a chain of classes -connected by 

30 navigable associations, where the first class is always the original search class or a subclass of 
the original search class. Such a chain is called a server chain. A server chain is schematically 
illustrated in Fig. 6 as a generic example of a UML traversal algorithm. 
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The extension of the graph consists of creating nodes 60 and edges 62 
corresponding to the classes and associations in the server chain. Pseudo code for the 
algorithm is presented above. The enumeration of servers using the UML model is performed 
in a breadth first fashion and is described in detail below. 

Since the software model might be very large, and since it is not to rely to 
heavily on static class information, the search depth in the server enumeration can be limited. 
The search depth is the number of associations used, to get from the original class to the last 
server in the server chain. 

If the first requirement that the graph is connected and acyclic is fulfilled, then a 
measure of reliability 440 could be 

Norigtnal edges 



Nn|ded edges**" Norigtnal edges 

which is 1 if no edges 62 needed to be added. 

The extendGraph algorithm can be improved by introducing a priority on the 
objects in the graph if there are several objects that could serve as a complainee for the 
original search object. A natural priority is to first consider nodes with classification from the 
"OK as is'^group as derived complainees. If that fails, nodes are considered with 
classification from the "Needs complainee^-group. This strategy requires either two 
traversal of the UML model, or a temporary storage of server chains with their last element 
corresponding to "Needs complainee" nodes. 

Because superclasses of the searched servers, both the original and the others, 
are not returned, it has to be checked not only if the returned servers themselves are in the 
graph, but also if there is a superclass of the returned server already in the graph. The reason 
for this inconvenience is that it is desirable to avoid adding a superclass of a server to keep the 
graph as specific as possible, but if the superclass is already present due to error messages or 
other graph extensions, it should be used when extending. 

A class for which servers are sought, will be called a search class, and theclass 
for which the search started is called the original search class. 

A server chain is a class chain consisting of classesconnected by navigable 
associations in the UML model. The first class is always the original search class or a subclass 
of the original search class. The search depth of the search chain is defined as the number of 
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associations used to get from the original search class to the last server in the chaia This 
notion of closeness of classes will sometimes be referred to as the association norm, even 
though it of course strictly speaking is not a norm. 

Since the algorithm returns a stream of server chains, it is defined in two parts, 
5 GetFirstServer and GetNextServer. Pseudo code for the two parts are presented below. 
The UML traversal algorithm pseudocode 

Name: GetFirstServer}, GetNextServer 
Input A class (the original search class) 
10 Maximal search depth 
An UML model 
Output: A stream of server chains 
Returns: OK, FINISHED or ABORTED 

15 State variables (common to GetFirstServer and GetNextServer): 
CurrentChain // The server chain currently being extended 
CurrentClass // The class currently searched for servers 
Queue // Holds the server chains that should be further extended 

CurrentAssociationList // The UML associations of the current class 

20 Superclass // Superclass of the last returned server 

SubclassQueue // Subclasses of the last returned server 

GetFirstServei{className, REF ServerChain) { 
clear all state variables 
25 set tmpClass - the class corresponding to className 
if <no class OR several classes found) 
error, 

put tmpClass in tmpChain //One single element 
30 enqueue tmpChain in Queue 

For<all subclasses of tmpClass) // To any depth 
empty tmpChain 
put the subclass in tmpChain 
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enqueue tmpChain in Queue 



return GetNextServeifServerChain) 

5 

GetNextServer(REF ServerChain) { 

// If there are any server subclasses previously found, return one of them 
if<SubclassQueue nonempty) 
10 set tmpClass = dequeue SubclassQueue 
for (subclasses of tmpClass) 
enqueue subclass in SubclassQueue 
set tmpChain = CurrentChain 
add tmpClass to tmpChain 
15 set ServerChain = tmpChain 

if(not to deep), enqueue ServerChain in Queue 
return OK 

if (CurrentAssociationList empty) 
20 if Superclass != NULL) 

set CurrentClass = Superclass // N.B. CurrentCIassChain same as before, 
set Superclass = NULL 
else // Get a new class {chain) from the queue 
if (Queue empty), return FINISHED 
25 set CurrentCIassChain = dequeue Queue 

set CurrentClass » last element in CurrentChain 

if {superclass of CurrentClass exist AND 
CurrentClass was not enqueued as a subclass of server) 
30 Superclass = superclass of CurrentClass // Multiple inheritance not allowed. 

set CurrentAssociationList = all associations of CuirentClass 



// We have a valid association collection to search for servers 
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if (CurrentAssociationList not empty) 
tmpAssociation - pop CurrentAssociationList 
set tmpClass to the associated class of CurrentClass 
if (tmpClass is a server or CurrentClass AND 
tmpClass not equal to the second to last class in CurrentClassChain) 

set tmpChain =CuirentChain 
add tmpClass to tmpChain 
set ServerChain = tmpChain 

for (subclasses of tmpClass) // Only first level inheritance 

enqueue subclass in SubclassQueue 
if (not to deep), enqueue ServerChain in Queue 
return OK 

return GetNextServer(ServerChain) // CurrentAssociationList empty 

Fig. 6 illustrates a traversal order of the UML model when searching for 
possible complainees for an original search class named "red". 

An example is shown in Fig. 6, with a qukecomplicated class and all server 
chains produced by the algorithm for the original search class "red" in Table 1. The search 
depth used is 2. The numbering of the classes have the semantics <search depth>.<running 
number>. 

Both super- and subclasses of the original search class are searched for servers, 
see classes 1.6 respectively 1.7 and 1.8 in Table 1. Superclasses need to be searched because 
the associations are inherited, and subclasses need to be searched since the error message or 
graph extension that introduced the original search class might have pointed out a superclass 
of the class actually active in the fault scenario under consideration. 

All subclasses of a possible server are enumerated as possible servers in their 
own right, see 1.3 to 1.5 in Table 1. In these server chains, only the subclass and not the 
original server is present, since the association leading to a parent is considered inherited. 

Superclasses of a server are not returned, since the association leading to the 
server specifically excludes the superclass. For example, the classes 'VedSup" and "BP in 
Table 1 are not returned. Servers of the superclasses are returned, though, since that 
association is inherited. In the server chain, it is chosen not to include the superclass itself 
since the association is considered inherited. 
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In a preferred embodiment of the system of the present invention it is intended for 
isolating a fault from a plurality of error messages in a system log 200 of a process controlled 
by an object oriented program 1 10. Each error message of the plurality of error messages 
comprises a predesigned signature, and an interface is located between the program and a 
5 means for process control. Whereby the interface comprises means for clustering 2 i 0 error 
messages belonging to a fault scenario of the fault. Also, means for forming from said 
signatures of said clustered 210 error messages a formal representation, a base graph 410, of 
the fault scenario. Further it comprises means for analyzing 230 the formal representation to 
isolate the fault, and means for reporting information containing the fault to the process 
10 control means, which provides measures taking care of the fault 

Hence, the process control is able to maintain, repair, diagnose etc. the detected fault 
in a much quicker way than known through prior art. 

A signature comprises information on complaint, complainer and^complainee, and 
the signature is formed from an explanation model comprising an underlying assumption of 
15 communication between system entities needed to carry out a specific task. System entities 
comprise objects 60, packages and threads. 

Regarding the time period it starts with the first error message in the fault scenario 
and ends with a stop message in one embodiment of the present invention. 

Further in an embodiment of the invention the system comprises means for 
20 establishing a reliability estimate 440 of the fault and that the fault information contains the 
estimate. 

If the base graph 410 is inconclusive, the system further comprises, in one 
embodiment of the invention, means for extending it by providing a system model 220, 
containing information on dependency relations between system entities, and means for 
25 isolating the fault from the extended graph 420 by using predetermined explanation rules. 

In one embodiment of the system according to the present invention it is used in a 
process involving an industrial robot. 

Another embodiment of the present invention provides a computer program code 
element comprising computer code means for enabling a processor to isolate a fault from a 
30 plurality of error messages in a system log 200 of a process -controlled by an object oriented 
program 1 10. Thereby enabling a processor to carry out the steps of; 
- forming an interface between the program 1 10 producing the plurality of error messages and 
a process controlling means, 
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- creating a cluster 2 1 0 from said plurality of error messages belonging to a fault scenario of 
the fault, 

- analyzing the cluster 400 using predefined error message signatures and a predetermined 
system model 220 to isolate the fault, and 

- reporting information containing said isolated fault to said process controlling means, 
which provides measures taking care of the fault. The code element is in one embodiment 
of the present invention contained in a computer readable medium. 

Further, the code element is supplied at least in part over a network such as the 
Internet in an embodiment of the present invention. 

In another embodiment of the present invention, a computer readable medium is 
provided, containing a computer program code element comprising computer code means for 
enabling a processor to isolate a fault from a plurality of error messages in a system log 200 
of a process controlled by an object oriented program 110. Whereby the processor is enabled 
to cany out the steps of: 

- forming an interface between said program producing the plurality of error messages and 
a process controlling means, 

- creating a cluster 210, 400 from said plurality of error messages belonging to a fault 
scenario of the fault, 

- analyzing 230 the cluster 400 using predefined error message signatures and a 
predetermined system model 220 to isolate the fault, and 

- reporting information containing the isolated fault to the process controlling means, which 
provides measures taking care of the fault. 

Also, the system according to the present invention is able to conduct features as 
described above in a method 

Means used in the interface are, for example, software and software drivers, any 
kind of suitable display etc. and other means known to a person skilled in the art 

It is thus believed that the operation and the embodiments of the present 
invention will be apparent, for a person skilled in the art, from the foregoing description. 
While the method and interface shown or described has been meant as being preferred it will 
be obvious that various changes and modifications may be made therein without departing 
from the spirit and scope of the present invention as defined in the attached claims. 
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Table 1 



1.1 red -Al 

5 

1.2 red -BB1 

1.3 red -Fl 
10 1.4 red -El 

1.5 red -FF1 

1.6 red -Dl 

IS 

UredSub -CI 

1.8 redSubsub -A2 

20 2.1 red -Al - A2 

2.2 red -BB1 - BB2 

2.3 red -BB1 - B2 

25 

2.4 red -El - E2 
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Claims 

1. A method for isolating a fault from a plurality of error messages in a system log 
(200) of a process controlled by an object oriented program <1 10), characterized by the steps 

5 of; 

- creating a predefined signature for each error message of said plurality of error messages, 

- forming a time period to frame from said plurality of error messages the error messages 
belonging to a fault scenario of said fault, 

- clustering (210) said error messages that fall within said time period, 

10 - forming from said signatures of said clustered (210, 400) error messages a base graph 
(610), which is a formal representation of said fault scenario, carrying cause-effect 
relation information of said error messages, 

- analyzing (230) said base graph to isolate said fault, and 

• reporting information containing said fault to a control means, which provides measures 
15 for taking care of the fault. 

2. A method according to claim 1, characterized in thaUach of said signatures is 
brought to contain information on complaint and complainer. 

3. A method according to claim 1 or 2, characterized in that an explanation model is 
formed from an underlying assumption of communication between system entities needed to 
carry out a specific task, which system entities comprises objects, packages and threads, and 
haying said signatures defined on the base of said explanation mode!. 

4. A method according to any of the preceding claims, characterized in that said 
time period starts with the first «rror message in said fault scenario and ends with a stop 
message. 

5. A method according to any of the preceding claims, characterized in that said 
reported information is containing a reliability estimate<430) of said fault. 

6. A method according to any of the precedingxlaims, characterized in that in case 
of an inconclusive base graph (410) said analyzing step further comprises the step of; 

- extending (420) said base graph with the help of a system model {220), containing 
information on the dependency relation between system entities, and 

- analyzing {230) said extended {420) base graph with a set of predetermined explanation 
rules to isolate said fault. 

7. A method according to claim 6, characterized in that said system model <220) is 
formed using the "Unified Modeling Language <UML)" (430). 
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8. Use of a method according to any of the proceeding claims in a process involving 
an industrial robot 

9. A system for isolating a fault from a plurality of error messages in a system log 
(200) of a process controlled by an object oriented program, characterized in that each error 

5 message of said plurality of error messages comprises a predesigned signature and that an 
interface is located between the program (1 10) and a means for process control, said interface 
comprises means for clustering (210, 400) error messages belonging to a fault scenario of said 
fault, means for forming from said signatures of said clusterred (210, 400) error messages a 
formal representation, a base graph (410), of said fault scenario, means for analyzing (230) 
10 said formal representation to isolate said fault, and means for reporting information contiiining 
said fault to said process control means, which provides measures for taking care of the fault. 

10. A system according to claim 9, characterized in that said signature comprises 
information on complaint, complainer and complainee, and that said signature is formed from 
an explanation model comprising an underlying assumption of communication between 

15 system entities needed to carry out a specific task, said system entities comprise objects, 
packages and threads. 

11. A system according to claims 9-10, characterized in that the time period starts 
with the first error message in said fault scenario and ends with a stop message. 

12. A system according to claims 9-1 1, characterized in that the system comprises 
20 means for establishing a reliability estimate (440) of said fault and that said fault information 

contains said estimate. 

13. A system according to claims 9-12, characterized in that in case of said base 
graph being inconclusive the system further comprises means for extending {420) said base 
graph (410) with the help of a system model (220), containing information on the dependency 

25 relation between system entities, and means for isolating said fault from said extended graph 
(420) by using predetermined explanation rules. 

14. A system according to claiml3, characterized in that the "Unified Modeling 
Language (UML) W (430) is used to form said system model (220). 

15. Use of a system according to claims 9-14 in a process involving an industrial 

30 robot 

16. A computer program code element comprising computer code means for 
enabling a processor to isolate a fault from a plurality of error messages in a system log (200) 
of a process controlled by an object oriented program (1 10), characterized by enabling a 
processor to cany out the steps of; 
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- forming an interface between said program (1 10) producing the plurality of error messages 
and a process controlling means, 

- creating a cluster<210) from said plurality of error messages belonging to a fault scenario 
of said fault, 

- analyzing (230) said cluster{400) using predefined error message signatures and a 
predetermined system model (220) to isolate said fault, and 

- reporting information containing said isolated fault to said process controlling means, 
which provides measures for taking care of the fault 

17. A computer program code element according to claim 16 contained in a 
computer readable medium. 

18. A computer program code element according to claim 16 supplied at least in part 
over a network such as the Internet 

19. A computer readable medium, characterized in containing a computer program 
code element comprising computer code means for enabling a processor to isolate a fault from 
a plurality of error messages in a system log (200) of a process controlled by an object 
oriented program (1 10), by which said processor is enabled to carry out the steps of: 

- forming an interface (400, 410, 420, 430, 440) between said program (1 10) producing the 
plurality of error messages and a process controlling means, 

- creating a cluster (2 1 0) from said plurality of error messages belonging to a fault scenario 
of said fault, 

- analyzing (230) said cluster (400) using predefined error message signatures and a 
predetermined system model (220) to isolate said fault, and 

- reporting information containing said isolated fault to s aid process controlling means, 
which provides measures taking care of the fault 
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